Using Machine Learning to Predict Outcomes for Documents

ABSTRACT

Evaluations of a document are generated that indicate likelihoods of the document achieving its objectives. The evaluations are based on predictive characteristics of one or more outcomes of the client document that are indicative of whether the document will achieve its objectives. Specifically, a server receives the document from a client device. The server extracts a set of features from the document. The evaluations of the document are generated based on the predictive characteristics for the one or more outcomes of the document. The generated evaluations are provided to the client device such that the document can be optimized to achieve its desired objectives. The optimized document may also be sent to a posting server for posting to a computer network. The known outcomes of the optimized document are collected through reader responses to the document and analyzed to improve evaluations for other documents.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. patentapplication Ser. No. 16/794,532, filed on Feb. 19, 2020, which is acontinuation application of U.S. patent application Ser. No. 15/161,151,filed on May 20, 2016, which claims the benefit of U.S. ProvisionalApplication No. 62/166,598, filed on May 26, 2015, each of which isincorporated by reference in its entirety.

BACKGROUND Field of Disclosure

The present invention generally relates to analyzing documents, and morespecifically to ways of analyzing documents to predict outcomes fordocuments posted on a computer network.

Description of the Related Art

Many documents are posted on a computer network with the desiredobjective of inducing responses to the document from certain types ofreaders. For example, a recruiting document is written with theobjective of inducing qualified applicants to fill a job vacancy. Theoutcomes of a document are characteristics of reader responses to thedocument and are indicative of whether the document will achieve itsobjective. For example, outcomes of the recruiting document may includethe number of applicants or the proportion of qualified applicants, bothof which are indicative of whether the document will achieve itsobjective of filing a job vacancy with a qualified applicant. Therefore,it is advantageous to tailor and optimize the document before postingthe document to induce desired outcomes that will help the documentachieve its objective. However, it is difficult for a document author toknow which types of outcomes the document will produce before it isposted on the network.

SUMMARY

The above and other issues are addressed by a method, computer-readablemedium, and analysis server for evaluating an electronic document withrespect to an objective. An embodiment of the method comprises receivingthe electronic document from a client device via a computer network. Theelectronic document has content directed toward achieving an objective.The method comprises extracting a set of features from the content ofthe electronic document. The method also comprises evaluating thefeatures in the set using one or more machine-learned models thatindicate directions and degrees of correlation between the featuresextracted from the content of the electronic document and the objectiveto which the content of the document is directed to predict an outcomeof the electronic document with respect to the objective. The methodfurther comprises providing the predicted outcome to the client device.

An embodiment of the medium includes a non-transitory computer-readablemedium storing executable computer program instructions for evaluatingan electronic document with respect to an objective. The computerprogram instructions comprise receiving the electronic document from aclient device via a computer network. The electronic document hascontent directed toward achieving an objective. The instructionscomprise extracting a set of features from the content of the electronicdocument. The instructions also comprise evaluating the features in theset using one or more machine-learned models that indicate directionsand degrees of correlation between the features extracted from thecontent of the electronic document and the objective to which thecontent of the document is directed to predict an outcome of theelectronic document with respect to the objective. The instructionsfurther comprise providing the predicted outcome to the client device.

An embodiment of the analysis server comprises a non-transitorycomputer-readable storage medium storing executable computer programinstructions and a processor for executing the instructions. Thecomputer program instructions comprise receiving the electronic documentfrom a client device via a computer network. The electronic document hascontent directed toward achieving an objective. The instructionscomprise extracting a set of features from the content of the electronicdocument. The instructions also comprise evaluating the features in theset using one or more machine-learned models that indicate directionsand degrees of correlation between the features extracted from thecontent of the electronic document and the objective to which thecontent of the document is directed to predict an outcome of theelectronic document with respect to the objective. The instructionsfurther comprise providing the predicted outcome to the client device.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a high-level block diagram illustrating an embodiment of anenvironment for optimizing a document to achieve its desired objectives,according to one embodiment.

FIG. 2 is a high-level block diagram illustrating an example computerfor implementing the client device, the analysis server, and/or theposting server of FIG. 1.

FIG. 3 is a high-level block diagram illustrating a detailed view of thedocument analysis module of the analysis server, according to oneembodiment.

FIG. 4 is an example user interface for an input document displayingevaluation results and phrase highlights for the input document.

FIG. 5 is a flowchart illustrating a process of generating an evaluationfor a document, according to one embodiment.

FIG. 6 is a flowchart illustrating a process of generating trainedmodels used to generate an evaluation for a document, according to oneembodiment.

DETAILED DESCRIPTION

The Figures (FIGS.) and the following description describe certainembodiments by way of illustration only. One skilled in the art willreadily recognize from the following description that alternativeembodiments of the structures and methods illustrated herein may beemployed without departing from the principles described herein.Reference will now be made in detail to several embodiments, examples ofwhich are illustrated in the accompanying figures. It is noted thatwherever practicable similar or like reference numbers may be used inthe figures and may indicate similar or like functionality.

FIG. 1 is a high-level block diagram illustrating an embodiment of anenvironment 100 for optimizing a document to achieve its desiredobjectives, according to one embodiment. The environment includes aclient device 110 connected by a network 122 to an analysis server 126and a posting server 134. Here only one client device 110, one analysisserver 126, and one posting server 134 are illustrated but there may bemultiple instances of each of these entities. For example, there may bethousands or millions of client devices 110 in communication with one ormore analysis servers 126 or posting servers 134.

The network 122 provides a communication infrastructure between theclient devices 110, the analysis server 126, and the posting server 134.The network 122 is typically the Internet, but may be any network,including but not limited to a Local Area Network (LAN), a MetropolitanArea Network (MAN), a Wide Area Network (WAN), a mobile wired orwireless network, a private network, or a virtual private network.

The client device 110 is a computing device such as a smartphone with anoperating system such as ANDROID® or APPLE® IOS®, a tablet computer, alaptop computer, a desktop computer, or any other type ofnetwork-enabled device. A typical client device 110 includes thehardware and software needed to connect to the network 122 (e.g., viaWifi and/or 4G or other wireless telecommunication standards).

The client device 110 includes a document input module 114 that allowsthe user of the client device 110 to interact with the analysis server126 and the posting server 134. The document input module 114 allows theuser to input a document as formatted text, and forwards the document tothe analysis server 126 for evaluation or to the posting server 134 forposting to the computer network 122. The document input module 114 alsopresents any feedback data from the analysis server 126 or the postingserver 134 back to the user of the client device 110. A client device110 may also be used by a reader of a posted document to respond to theposting.

In one embodiment, the document input module 114 includes a browser thatallows a user of the client device 110 to interact with the analysisserver 126 and the posting server 134 using standard Internet protocols.In another embodiment, the document input module 114 includes adedicated application specifically designed (e.g., by the organizationresponsible for the analysis server 126 or the posting server 134) toenable interactions among the client device 110 and the servers. In oneembodiment, the document input module 114 includes a user interface 118that allows the user of the client device 110 to edit and format thedocument and also presents feedback data about the document from theanalysis server 126 or the posting server 134 to the client device 110.

Generally, the content of the document includes text written andformatted by an author directed towards achieving one or more desiredobjectives when presented to readers. A document may be classified intodifferent types depending on its primary objective. For example, adocument may be classified as a recruiting document when the primaryobjective of the document is to gather candidates to fill a vacant jobposition at a business organization. As another example, the documentmay be classified as a campaign speech when the primary objective of thedocument is to relay a political message of a candidate running forgovernment office to gather a high number of votes for an election.

The analysis server 126 includes a document analysis module 130 thatextracts a set of features from an input document, analyzes thefeatures, and outputs evaluations of the document that indicatelikelihoods of whether the document will achieve a defined set ofobjectives, including its primary objective. Each evaluation may beassociated with a specific objective of the document. For example, oneevaluation may be a favorability score that indicates the likelihood arecruiting document will achieve its objective of filling a vacant jobposition with a qualified applicant. As another example, an evaluationmay be a likelihood that the recruiting document will achieve itsobjective of receiving gender neutral responses, indicating no genderbias. Each evaluation may be based on one or more predicted outcomes ofthe document, which are predicted characteristics of reader responses tothe document.

The set of objectives for an input document are defined based on thetype of the input document and indicate common goals that authors forthe type of document are interested in achieving, and may includeobjectives relating to demographic information of people responding tothe document. The set of objectives may differ across different types ofdocuments due to different desired outcomes. For example, an author of acampaign speech may be interested in the objective of collecting a highnumber of voters for the political candidate, but may also be interestedin additional demographic objectives such as gathering votes from acertain location, or gathering votes from people with a certainsocio-economic background. As another example, an author of a recruitingdocument may be interested in the objective of collecting a high numberof applicants for a vacant job opening, but may also be interested inadditional recruiting objectives such as hiring a candidate with anengineering background, which may not be of interest to the author ofthe campaign speech document.

In one embodiment, an administrator of the analysis server 126 specifiesthe set of objectives for an input document depending on its type. Inanother embodiment, the analysis server 126 may specify a large set ofpotential objectives for the input document to the user of the clientdevice 110, and the user may select a subset of the potential objectivesfor which the document analysis module 130 would perform theevaluations.

The results of the evaluations are provided back to the client device110 and the document may be automatically or manually optimized based onthe evaluations to improve its likelihood of achieving its desiredobjectives. Each evaluation may be presented in various forms, such as anumerical score, a scale, or a plot, but is not limited thereto.

The posting server 134 includes a document posting module 138 that poststhe optimized document and receives outcome data on the optimizeddocument. For example, the document posting module 138 may post arecruiting document optimized based on the evaluations received by thedocument analysis module 130. After the document has been posted, thedocument posting module 138 may receive applications for the postedposition, as well as outcome data describing characteristics of peoplewho responded to the document. The collected outcome data may beprovided to the document analysis module 130 in order to refineevaluations on other documents, and also may be provided back to theclient device 110.

Thus, the environment 100 shown in FIG. 1 optimizes a document toachieve its desired objectives by providing evaluations of the documentand tailoring the document based on the evaluations. The environment 100also posts the document and collects outcome data for the document thatcan be used to refine evaluations on other documents.

FIG. 2 is a high-level block diagram illustrating an example computer200 for implementing the client device 110, the analysis server 126,and/or the posting server 134 of FIG. 1. The computer 200 includes atleast one processor 202 coupled to a chipset 204. The chipset 204includes a memory controller hub 220 and an input/output (I/O)controller hub 222. A memory 206 and a graphics adapter 212 are coupledto the memory controller hub 220, and a display 218 is coupled to thegraphics adapter 212. A storage device 208, an input device 214, andnetwork adapter 216 are coupled to the I/O controller hub 222. Otherembodiments of the computer 200 have different architectures.

The storage device 208 is a non-transitory computer-readable storagemedium such as a hard drive, compact disk read-only memory (CD-ROM),DVD, or a solid-state memory device. The memory 206 holds instructionsand data used by the processor 202. The input interface 214 is atouch-screen interface, a mouse, track ball, or other type of pointingdevice, a keyboard, or some combination thereof, and is used to inputdata into the computer 200. The graphics adapter 212 displays images andother information on the display 218. The network adapter 216 couplesthe computer 200 to one or more computer networks.

The computer 200 is adapted to execute computer program modules forproviding functionality described herein. As used herein, the term“module” refers to computer program logic used to provide the specifiedfunctionality. Thus, a module can be implemented in hardware, firmware,and/or software. In one embodiment, program modules are stored on thestorage device 208, loaded into the memory 206, and executed by theprocessor 202.

The types of computers 200 used by the entities of FIG. 1 can varydepending upon the embodiment and the processing power required by theentity. The computers 200 can lack some of the components describedabove, such as graphics adapters 212, and displays 218. For example, theanalysis server 126 can be formed of multiple blade serverscommunicating through a network such as in a server farm.

FIG. 3 is a high-level block diagram illustrating a detailed view of thedocument analysis module 130 of the analysis server 126, according toone embodiment. The document analysis module 130 is comprised of modulesincluding a data storage module 350, a corpus management module 306, afactor extraction module 310, a phrase extraction module 314, a trainingmodule 318, a weighting module 322, an evaluation module 326, a displaymodule 330, and a response verification module 334. Some embodiments ofthe document analysis module 130 have different modules than thosedescribed here. Similarly, the functions can be distributed among themodules in a different manner than is described here.

The data storage module 350 stores data used by the document analysismodule 130. The data include a training corpus 354, metadata and factors358, phrase-related features 362, and weights 366. The training corpus354 is a collection of documents that are presented to readers and areassociated with a set of known outcomes.

The corpus management module 306 generates, maintains, and updates thetraining corpus 354. The corpus management module 306 may collectdocuments in the training corpus 354, as well as their outcomes, fromvarious sources. In one instance, the corpus management module 306collects documents that were previously posted and presented to readers,and have a set of known outcomes. These documents may include documentsposted, and corresponding outcome data received, by the posting server134. The corpus management module 306 may also collect documents bycrawling websites on the network, or may be provided with such documentsby entities such as business organizations. In another instance, thecorpus management module 306 automatically collects outcome data oncurrently posted documents through techniques such as user telemetry. Inone embodiment, the corpus management module 306 continuously andautomatically updates the training corpus 354 as new documents with aset of known outcomes are received from various sources or the postingserver 134.

In one embodiment, the set of outcomes associated with documents includecharacteristics of reader responses to the document. The characteristicsmay describe the number of responses, types of responses, timeliness ofresponses, and demographic information about the responders. Thedemographic information relates to any specific characteristics of theresponders, and may relate to, for example, the gender, ethnicity,qualification levels, titles, and personality traits (e.g., introvertedor extroverted) of the responders. For example, possible outcomes for arecruiting document may include the number of applicants, time requiredto fill the vacant position described in the document, proportion ofqualified applicants, proportion of male versus female applicants,current job titles of the applicants, and proportion of applicants undera certain age, but is not limited thereto.

The set of outcomes are indicative of whether the document will achieveits desired objectives in the future. Returning to the example of arecruiting document, the number of applicants or the proportion ofqualified applicants may be highly indicative of whether the documentwill achieve its objective of hiring a qualified candidate for a vacantposition. As another example, the proportion of female versus maleapplicants is highly indicative of whether the document will achieve itsobjective of acquiring gender neutral responses. Thus, achieving desiredoutcomes for a document are directly related to achieving the desiredobjectives for the document. The values of the set of outcomes fordocuments in the training corpus 354 are already known, as the documentshave previously been posted on a computer network.

The factor extraction module 310 and the phrase extraction module 314each extract sets of features from documents. The sets of featuresuniquely characterize a document and are features of a document that areexpected to correlate with outcomes of the document. As described inmore detail below, the sets of features include metadata, linguisticfactors, and phrase-related features of a document. The identifiedfeatures may be different for each document type as the relevantoutcomes and objectives are different between each type of document.

The factor extraction module 310 extracts metadata and linguisticfactors from documents. The metadata includes information about thedocument other than the content of the document. This may include, forexample, the title of the document, geographical locations associatedwith the document, or any other data that may correlate with theoutcomes of the document. For example, the factor extraction module 310may extract the job location for a recruiting document, as well as theindustry and field of the job in the recruiting document.

The linguistic factors include syntactic, structural, and semanticfactors extracted from the content of the documents. Syntactic factorsrelate to the set of rules, principles, and processes that govern thestructure of sentences in the documents. For example, syntactic factorsinclude the proportion and frequency of verbs, the average length ofphrases, clauses, sentences, n-grams and how they are assembledtogether, sentence complexity, and speech density, but is not limitedthereto. Structural factors relate to the structure and layout of thedocument content. For example, structural factors may include wordcounts, sentence counts, character counts, the proportion of text initalic font, and the proportion of content in bulleted lists, but is notlimited thereto. Semantic factors relate to the meaning of words,phrases, and sentences in the documents based on their ordering. Forexample, whether a company's recruiting document contains an equalopportunity statement is a semantic factor.

In one embodiment, the factor extraction module 310 generates semanticfactor models for determining the presence and strength of semanticfactors in documents given a set of predetermined syntactic andstructural factors of the documents. For example, the factor extractionmodule 310 may generate a model that determines the presence andstrength of an equal opportunity statement in a document given certainn-grams or significant phrases (e.g., “equal,” “opportunity,” “male,”“female”) of the document.

The presence of a semantic factor is a feature indicating whether asemantic factor is present in documents, and is represented by a binaryvalue (e.g., 0 or 1, “existing equal opportunity statement” or “no equalopportunity statement”) to indicate the presence. The strength of asemantic factor is a feature indicating the degree of presence of thesemantic factor, and may be represented by discrete values (e.g., “verystrong equal opportunity statement,” “strong equal opportunitystatement,” “weak equal opportunity statement”) or may be represented bycontinuous numerical values (e.g., number on a scale from 0 to 100 orconfidence levels). In one instance, the model may be determined byanalyzing labeled documents in the training corpus 354 which are labeledwith known values for the presence and strength of semantic factors inthe documents.

The factor extraction module 310 extracts metadata and linguisticfactors from documents in the training corpus 354 and stores them asmetadata and factors 358 in the data storage module 350, along with thesemantic factor models.

The phrase extraction module 314 extracts phrase-related features fromdocuments. The phrase-related features indicate the presence ofdistinctive phrases associated with a category and their level ofassociation to the category. For example, the set of distinctive phrases“the Apple, Big Apple, upstate, Empire State, Broadway,” may beassociated with the category “phrases related to New York City.” Asanother example, the set of distinctive phrases “core competency, movethe needle, corporate values, think outside the box, leverage, drilldown,” may be associated with the category “phrases related to corporatejargon.”

The phrase extraction module 314 identifies associations betweendistinctive phrases and corresponding categories by analyzing documents.In one embodiment, the phrase extraction module 314 determines theassociations between distinctive phrases and corresponding categories byextracting phrases that correlate with certain outcomes or metadataacross documents in the training corpus 354. The categories associatedwith distinctive phrases may be assigned based on the outcomes ormetadata of the documents.

For example, the phrase extraction module 314 may identify distinctivephrases that occur most frequently in recruiting documents with theoutcome of a high proportion of female applicants and may assign thecategory “phrases likely to attract female candidates,” to theidentified phrases. As another example, the phrase extraction module 314may identify distinctive phrases that occur most frequently inrecruiting documents in the pharmaceutical industry and may assign thecategory “phrases related to pharmaceuticals,” to the identifiedphrases. As another example, the phrase extraction module 314 mayidentify distinctive phrases that occur frequently in recruitingdocuments with the outcome of a small number of applicants and mayassign the category “negative phrases,” to the identified phrases. Theextracted set of negative phrases may further be divided and eachassigned to separate categories of “obscene phrases,” “corporatejargon,” and “offensive language.”

Based on the associations identified above from the training corpus 354,the phrase extraction module 314 extracts phrase-related features fromdocuments including the presence of distinctive phrases associated witha category and their level of association with the category. Thepresence of distinctive phrases associated with a category is a featureindicating whether a document contains any phrases that are associatedwith a certain category, and may be represented by a binary value. Forexample, for a given document, the presence of “phrases related tocorporate jargon” may have a binary value of 0 or 1 depending on whetherthe document contains any one of the phrases “core competency, move theneedle, corporate values, think outside the box, leverage, drill down.”

The level of association between distinctive phrases and their categoryis a feature indicating how strong of an association the distinctivephrases in a document have with their corresponding category, and may berepresented by discrete or numerical values. For example, for a givendocument having one or more distinctive phrases associated with thecategory “obscene phrases,” the level of association with the categorymay be represented by discrete levels of “not obscene,” “somewhatobscene,” “obscene,” and “very obscene,” or may be represented bycontinuous numerical values (e.g., number on a scale from 0 to 100 orconfidence levels). In one embodiment, the level of association may alsobe determined based on the analysis performed by the phrase extractionmodule 314 on the training corpus 354.

For each document in the training corpus 354, the phrase extractionmodule 314 identifies phrase-related features that include the presenceof any distinctive phrases in corresponding categories and the phrases'levels of association to the categories, and stores them asphrase-related features 362 in the data storage module 350. Theidentified associations between distinctive phrases and theircorresponding categories are also stored as phrase-related features 362in the data storage module 350.

The training module 318 generates one or more machine-learned modelsthat predict outcomes for documents given the set of features extractedfrom the documents. The set of features includes the metadata,linguistic factors, and the phrase-related features identified throughthe factor extraction module 310 and the phrase extraction module 314.The models are generated based on the features extracted from documentsin the training corpus 354 and the known outcomes of the documents inthe training corpus 354. For example, the training module 318 maygenerate the models by correlating the set of features for each documentin the training corpus 354 (stored as metadata and factors 358 andphase-related features 362) with the corresponding known outcomes foreach document. Given a set of features, the training module 318 maytrain individual models that predict a single outcome, or may trainmodels that predict multiple outcomes or a combination of outcomes.

Returning to the example of a recruiting document, the training module318 may generate a model that predicts the number of applicants for adocument upon receiving a set of extracted features for the document. Asdiscussed above, example features may be the presence of an equalopportunity statement, location of the job, and presence of “phrasesrelated to corporate jargon.” As another example, the training module318 may train a different model that predicts the proportion of maleapplicants given the same set of extracted features for the document. Asanother example, the training module 318 may train a single model thatpredicts both outcomes at once given the same set of extracted featuresfor the document.

The generated models indicate a direction and degree of correlationbetween the features and outcomes of documents through coefficientsgenerated for each of the features. For example, the machine-learnedmodels may indicate directions and degrees of correlation between thepresence and strength of distinctive phrases in recruiting documents anddemographic information of people who respond to the recruitingdocuments. In one embodiment, the sign of the coefficient indicates thedirection of the correlation, and the absolute value of the coefficientindicates the degree of correlation. For example, a trained modelrelating the set of features to the number of applicants for a documentmay indicate that the presence of “obscene phrases” is negativelycorrelated with high degree of significance through a negativecoefficient having a large absolute value. As another example, the samemodel may indicate that the presence of a strong equal opportunitystatement is positively correlated with high degree of significancethrough a positive coefficient having a large absolute value.

Features that are statistically significant may differ across eachoutcome. For example, the proportion of bullet point content may have asignificant correlation with the outcome of the number of applicants,but may have an insignificant correlation with the proportion of veteranapplicants.

In one embodiment, the training module 318 continuously andautomatically updates the trained models as the training corpus 354 isupdated with new documents. By updating the models, the training module318 is able to identify new correlations between the set of features andoutcomes of a document, as well as modify or delete existingcorrelations to capture any changing patterns over time.

The weighting module 322 assigns a weight to each feature in the set offeatures for an outcome based on the trained models generated by thetraining module 318 for that outcome. Specifically, a weight assigned toa feature may indicate the predictive power of the feature for theoutcome. Similarly to the coefficients identified in the trained models,the weight assigned for each feature may include the direction anddegree of correlation between the features and an outcome, and may berepresented by a positive number or a negative number.

The weights may be assigned based on the coefficients identified throughthe trained models, but are not required to be identical to thecoefficients for that outcome. For example, the weights assigned to aset of features indicating predictive power for the outcome of theproportion of female applicants may be a constant factor or atranslation of the coefficients identified through a correspondingtrained model relating the set of features to the outcome. As anotherexample, multiple features may be assigned the same numerical weight fora certain outcome if the corresponding coefficients identified throughthe trained model for the outcome are above or below a predeterminedthreshold.

The weighting module 322 may automatically update the weights as thetrained models are updated by the training module 318. The weights aresaved as weights 366 in the data storage module 350.

Responsive to receiving an input document from the client device 110,the evaluation module 326 extracts the set of features identified by thefactor extraction module 310 and the phrase extraction module 314 fromthe input document, and outputs evaluations of the input document thatare likelihoods of whether the document will achieve its set of definedobjectives. As mentioned earlier, the defined objectives for thedocument may be determined based on the type of document and/or may bespecified by the user that provided the document. Each evaluation isassociated with an objective of a document, and is based on predictivecharacteristics of one or more outcomes of the input document. Thepredictive characteristics are identified through the trained modelsgenerated by the training module 318. The evaluations may come in theform of a numerical score or a visual scale indicating the degree ofbias, but is not limited thereto.

In one embodiment, the evaluation module 326 may generate theevaluations by applying the trained models to the set of extractedfeatures of the input document to generate predicted values for the oneor more outcomes of the input document. The evaluation module 326 maycombine the predicted outcomes to generate the evaluations. For example,an evaluation indicating the likelihood a recruiting document willachieve its objective of filling a vacant job position may be generatedby extracting the set of features from the input document, applying thetrained models for predicting the number of applicants and theproportion of qualified applicants, and combining the predicted valuesfor the outcomes into a normalized score.

In another embodiment, the evaluation module 326 may generate theevaluations by summing the weights of the features in the input documentfor the one or more outcomes. As discussed above in conjunction with theweighting module 322, the weights associated with the set of featuresfor an outcome are assigned based on the predictive power of thefeatures for that outcome, and are identified through the trained modelsfor that outcome. For example, for an evaluation based on the outcome ofthe number of applicants, the evaluation module 326 may identify thepresence of “obscene phrases,” having a weight of −5 for the outcome,the presence of “phrases related to corporate jargon,” having a weightof −6 for the outcome, and the presence of an equal opportunitystatement, having a weight of +3 for the outcome, in the input document.The evaluation module 326 may then generate the evaluation by scalingthe sum of the weights, −8, to a normalized score.

As another example, for an evaluation of whether a document will receivea gender neutral response based on the outcome of the proportion of maleversus female applicants, the evaluation module 326 may identify thepresence of an equal opportunity statement, having a weight of +10 forthe outcome, the presence of “phrases likely to attract females,” havinga weight of −6 for the outcome, and the presence of “phrases related toSeattle,” having a weight of +1 for the outcome, in the input document.The evaluation module 326 may then output the sum of the weights, +5, toindicate the likelihood.

Thus, the evaluation module 326 evaluates the set of features in theinput document using one or more machine-learned models that indicatedirections and degrees of correlation between the features extractedfrom the input document and the objective to which the document isdirected to predict an outcome of the input document with respect to itsobjective.

In one instance, the evaluation module 326 may rank the set of featuresin the input document according to their weights for the one or moreoutcomes of an evaluation. In one instance, the evaluation module 326may rank the set of features according to the absolute value of theircorresponding weights, as a higher absolute value translates to a highercontribution to the outcome, and thus, a higher contribution to theevaluation.

The evaluation module 326 may filter out a subset of the features in theinput document based on the rankings. In one embodiment, the filteredfeatures may be determined by ordering the set of features according totheir absolute value of weights and selecting a predetermined number orproportion of features that are ranked the highest. In anotherembodiment, the filtered features may be determined by ordering the setof features within each group of positive and negative weights, andselecting a predetermined number or proportion of features that areranked the highest within each group.

The filtered features may later be presented to the user of the clientdevice 110 through the display module 330 to indicate which featurescontribute significantly to an evaluation. Thus, the input document maybe optimized based on the filtered features to increase its likelihoodof achieving its objectives. The evaluation module 326 provides theevaluation results including the evaluations and the filtered featuresto the display module 330 for presentation to the user of the clientdevice 110.

The display module 330 presents the evaluation results of the inputdocument, as well as phrase highlights in the input document to the userof the client device 110 through the user interface 118. Specifically,the display module 330 receives evaluation results generated by theevaluation module 326 for the input document and graphically presentsthe results in the context of the input document through the userinterface 118. This includes displaying the evaluations performed on theinput document and the filtered features for any of the presentedevaluations.

The display module 330 also identifies distinctive phrases in thecontent of the input document, and, for each identified distinctivephrase, indicates the influence of the distinctive phrase on thepredicted outcomes used to evaluate the input document with respect toits objectives. In one embodiment, the display module 330 identifies andhighlights any distinctive phrases in the input document that have acorresponding phrase-category association as identified inphrase-related features 362 in the user interface 118. In anotherembodiment, the evaluation module 326 identifies distinctive phrases inthe input document, and provides this information to the display module330 such that the corresponding phrases can be highlighted in the userinterface 118. The categories of the distinctive phrases are alsodisplayed in the user interface 118.

The display module 330 enables the input document to be edited andrevised through the user interface 118 to improve its likelihood ofachieving its set of objectives. Specifically, the input document may beoptimized based on the evaluation results presented by the displaymodule 330. As an example, a recruiting document having an undesirableevaluation may be presented with a filtered feature indicating thepresence of “obscene phrases” in the input document. Upon receiving theevaluation results, the input document may be revised to eliminate all“obscene phrases” to improve its evaluation and to improve itslikelihood of achieving its desired outcomes. The revised input documentis again provided to the evaluation module 326, and updated evaluationresults are displayed to the user of the client device 110 by thedisplay module 330 as the input document is being revised. In thismanner, the evaluation results for an input document may automaticallybe updated such that the input document is optimized to target desiredoutcomes before being posted to a computer network.

FIG. 4 is an example user interface 118 for an input document displayingevaluation results and phrase highlights for the input document. In oneembodiment, the user interface 118 is generated by the display module330 and provided to the client device 110 for display thereon.

As shown in the example in FIG. 4, the display module 330 generates aphrase highlight 410 on the phrase “crazy” based on its associatedcategory 414 “Masculine” in one outlined pattern. As another example,the display module 330 generates a phrase highlight on the phrase“buy-in” based on its associated category of

“Repetitive” phrases.

As shown in FIG. 4, for the recruiting document of a “General MarketMedia Buyer,” a favorability score 418 of “72” indicating the likelihoodthe recruiting document will achieve its objective of filling the vacantposition is generated by the evaluation module 326 and is presented onthe user interface 118 by the display module 330. A set of filteredfeatures 422 and 426 are also identified and presented for thefavorability score evaluation. Specifically, the set of filteredfeatures contain features with positive weight 422 that contributed toincreasing the favorability score. These include, for example, featuresrelating to the use of positive language and the length of the document.The set of filtered features also contain features with negative weight426 that contributed to decreasing the favorability score. Theseinclude, for example, features relating to using corporate jargon and amissing equal opportunity statement in the document.

As another example shown in FIG. 4, the evaluation 430 indicating thelikelihood of gender neutral responses is displayed in the form of ascale. The left-most side corresponds to a low likelihood of achievinggender neutral responses due to a high prediction of male responses, andthe right-most side corresponds to a low likelihood of achieving genderneutral responses due to a high prediction of female responses. Althoughnot shown in FIG. 4, the filtered features for this particularevaluation may include, for example, features relating to the presenceand strength of “female” phrases, the presence and strength of “male”phrases, and the presence and strength of an equal opportunitystatement.

Once the document has been optimized, the document input module 114 mayprovide the optimized input document to the document posting module 138in the posting server 134. As discussed above, the document postingmodule 138 posts optimized documents to a computer network and collectsoutcome data for the posted documents, and provides this informationback to the response verification module 334 for further analysis.

Returning to the document analysis module 130 shown in FIG. 3, theresponse verification module 334 verifies whether the evaluationsgenerated by the document analysis module 130 are reliable based on theoutcome data received from the document posting module 138 on documentsthat have already been evaluated, optimized, and posted. The responseverification module 334 provides the optimized documents and outcomedata to other modules of the document analysis module 130 for improvingevaluations on other documents.

The response verification module 334 compares the actual outcomes forthe optimized documents to their corresponding evaluations to determinewhether the document analysis module 130 is generating reliableevaluations with respect to the given objectives. In one embodiment theresponse verification module 334 uses predetermined thresholds toevaluate whether an evaluation is considered “reliable.” For example, ifthe evaluation indicates a high likelihood that the document willattract gender-neutral responses (e.g., an equal proportion of male andfemale responders), the response verification module 334 may apply apredetermined threshold to the actual gender outcome for the document todetermine whether the evaluation was reliable. The threshold mayindicate that the responses were not gender-neutral, and hence theevaluation was not reliable, if the more than, e.g., 65% of respondershave the same gender.

In one instance, the response verification module 334 periodicallyprovides the verification information to the other modules of thedocument analysis module 130 when new optimized documents and theiroutcome data are received from the document posting module 138. Inanother instance, the response verification module 334 automaticallyprovides this information when the verification performed by theresponse verification module 334 indicates that the document analysismodule 130 is increasingly generating unreliable evaluations. Forexample, this may occur when the response verification module 334determines that a proportion of optimized documents above a specifiedthreshold are generating unreliable evaluations with respect to some orall outcomes.

In one instance, the response verification module 334 provides theoptimized documents and the outcome data to the corpus management module306 such that the corpus management module 306 may improve the contentof the training corpus 354. For example, the corpus management module306 may add the optimized documents and corresponding outcome data tothe training corpus 354. An updated set of features, models, and weightsmay be generated based on the updated training corpus 354, such that newtrends or patterns are extracted by the various modules of the documentanalysis module 130. Similarly, the corpus management module 306 maydelete documents in the training corpus 354 and replace them with theoptimized document and outcome data received from the document postingmodule 138.

In another instance, the response verification module 334 provides theoptimized documents and the outcome data to the factor extraction module310 and/or the phrase extraction module 314 such that the modules mayimprove the set of features. For example, the phrase extraction module314 may identify the presence of the phrase “Artificial Intelligence” asfrequently occurring in documents with high evaluations and high desiredoutcomes, and may update the set of features to include this feature. Asanother example, the factor extraction module 310 and the phraseextraction module 314 may identify features that are contributing tounreliable evaluations and may delete these features from the set offeatures.

In this manner, input documents may go through an automated cycle ofbeing evaluated, optimized, posted, and re-evaluated for improvingevaluations on other documents. Specifically, an input document isprovided to the document analysis module 130, evaluated based on its setof defined objectives and optimized based on the generated evaluations.The optimized document is posted to a computer network through thedocument posting module 138, and actual outcome data on the optimizeddocument is collected and provided back to the document analysis module130 to improve the evaluations of other documents.

FIG. 5 is a flowchart illustrating a process of generating an evaluationfor a document, according to one embodiment. In one embodiment, theprocess of FIG. 5 is performed by the analysis server 126. Otherentities may perform some or all of the steps of the process in otherembodiments. Likewise, embodiments may include different and/oradditional steps, or perform the steps in different orders.

A client document is received 502 from a client device. The clientdocument includes content directed towards achieving an objective. A setof features is extracted 504 from the content of the client document.The features in the set of features are evaluated 506 using one or moremachine-learned models that indicate directions and degrees ofcorrelation between the features extracted from the content of theclient document and the objective to which the content of the documentis directed to predict an outcome of the client document with respect toits objective. The predicted outcome is provided 508 to the clientdevice.

In one embodiment, the evaluated client document is sent 510 to aposting server 134 such that the client document can be posted on thecomputer network by the posting server. Readers may respond to theposted document. The outcome data describing responses to the posting ofthe client document with respect to the objective are received 512 fromthe posting server 134. The machine-learned models are selectivelyrevised 514 based on the received outcome data.

FIG. 6 is a flowchart illustrating a process of generatingmachine-learned models used to generate an evaluation for a document,according to one embodiment. In one embodiment, the process of FIG. 6 isperformed by the analysis server 126. Other entities may perform some orall of the steps of the process in other embodiments. Likewise,embodiments may include different and/or additional steps, or performthe steps in different orders.

A training corpus of documents is generated 602 by gathering electronicdocuments and associated known outcome data describing known outcomesresulting from postings of the electronic documents on the network. Theset of features is extracted 604 from contents of each of the documentsin the training corpus. One or more machine-learned models are generated606 by correlating the extracted sets of features extracted from thecontents of the documents with the associated known outcome data. In oneembodiment, weights are assigned 608 to the set of features based on theone or more machine-learned models.

OTHER CONSIDERATIONS

Some portions of the above description describe the embodiments in termsof algorithmic processes or operations. These algorithmic descriptionsand representations are commonly used by those skilled in the dataprocessing arts to convey the substance of their work effectively toothers skilled in the art. These operations, while describedfunctionally, computationally, or logically, are understood to beimplemented by computer programs comprising instructions for executionby a processor or equivalent electrical circuits, microcode, or thelike. Furthermore, it has also proven convenient at times, to refer tothese arrangements of functional operations as modules, without loss ofgenerality. The described operations and their associated modules may beembodied in software, firmware, hardware, or any combinations thereof.

As used herein any reference to “one embodiment” or “an embodiment”means that a particular element, feature, structure, or characteristicdescribed in connection with the embodiment is included in at least oneembodiment. The appearances of the phrase “in one embodiment” in variousplaces in the specification are not necessarily all referring to thesame embodiment.

Some embodiments may be described using the expression “coupled” and“connected” along with their derivatives. It should be understood thatthese terms are not intended as synonyms for each other. For example,some embodiments may be described using the term “connected” to indicatethat two or more elements are in direct physical or electrical contactwith each other. In another example, some embodiments may be describedusing the term “coupled” to indicate that two or more elements are indirect physical or electrical contact. The term “coupled,” however, mayalso mean that two or more elements are not in direct contact with eachother, but yet still co-operate or interact with each other. Theembodiments are not limited in this context.

As used herein, the terms “comprises,” “comprising,” “includes,”“including,” “has,” “having” or any other variation thereof, areintended to cover a non-exclusive inclusion. For example, a process,method, article, or apparatus that comprises a list of elements is notnecessarily limited to only those elements but may include otherelements not expressly listed or inherent to such process, method,article, or apparatus. Further, unless expressly stated to the contrary,“or” refers to an inclusive or and not to an exclusive or. For example,a condition A or B is satisfied by any one of the following: A is true(or present) and B is false (or not present), A is false (or notpresent) and B is true (or present), and both A and B are true (orpresent).

In addition, use of the “a” or “an” are employed to describe elementsand components of the embodiments herein. This is done merely forconvenience and to give a general sense of the disclosure. Thisdescription should be read to include one or at least one and thesingular also includes the plural unless it is obvious that it is meantotherwise.

Upon reading this disclosure, those of skill in the art will appreciatestill additional alternative structural and functional designs for asystem and a process for generating evaluations of documents based onone or more outcomes of the document. Thus, while particular embodimentsand applications have been illustrated and described, it is to beunderstood that the described subject matter is not limited to theprecise construction and components disclosed herein and that variousmodifications, changes and variations which will be apparent to thoseskilled in the art may be made in the arrangement, operation and detailsof the method and apparatus disclosed herein. CLAIMS

1. A method of evaluating an electronic document with respect to anobjective, comprising: receiving the electronic document from a clientdevice via a computer network, the electronic document having contentdirected toward achieving the obj ective; extracting a set of featuresfrom the content of the electronic document, the extracted featuresincluding phrase-related features indicating presence of distinctivephrases in the content of the electronic document, categoriescorresponding to the distinctive phrases, and the distinctive phrases'levels of association with the corresponding categories; evaluating thefeatures in the set using one or more machine-learned models thatindicate directions and degrees of correlation between the featuresextracted from the content of the electronic document and the objectiveto which the content of the document is directed to predict an outcomeof the electronic document with respect to the objective, evaluating thefeatures comprising: assigning weights to the features in the set, aweight assigned to a feature indicating a direction and degree ofcorrelation between the feature and the predicted outcome of theelectronic document, and combining the weights of the set of features topredict the outcome of the electronic document; and providing thepredicted outcome for display on a user interface of the client device,wherein providing the predicted outcome comprises identifying theplurality of distinctive phrases in the content of the electronicdocument and, for each identified distinctive phrase, indicating aninfluence of the distinctive phrase on the predicted outcome of theelectronic document with respect to the objective.
 2. The method ofclaim 1, wherein a sign of a weight assigned to a feature indicates thedirection of correlation between the feature and the predicted outcomeof the electronic document, and an absolute value of the weight assignedto the feature indicates the degree of correlation between the featureand the predicted outcome.
 3. The method of claim 1, wherein a level ofassociation of a distinctive phrase with a corresponding category isrepresented by a numerical value.
 4. The method of claim 1, whereinproviding the predicted outcome further comprises identifying a categoryassociation of each identified distinctive phrase.
 5. The method ofclaim 1, further comprising: establishing a training corpus ofelectronic documents and associated known outcome data describing knownoutcomes resulting from postings of the electronic documents on thecomputer network; extracting a set of training features from contents ofthe electronic documents in the training corpus; and generating the oneor more machine-learned models by correlating the sets of trainingfeatures extracted from the contents of the electronic documents in thetraining corpus and the associated known outcome data.
 6. The method ofclaim 5, wherein assigning weights to the features in the set is basedon coefficients identified through the one or more machine-learnedmodels. The method of claim 1, further comprising: posting theelectronic document on the computer network; receiving outcome datadescribing responses to the posting of the electronic document on thecomputer network with respect to the objective; and selectively revisingthe machine-learned models based on the outcome data.
 8. The method ofclaim 1, wherein extracting the set of features from the content of theelectronic document further comprises: extracting syntactic factorsdescribing a structure of sentences in the content of the electronicdocument; extracting structural factors relating to structure and layoutof the content of the electronic document; and extracting semanticfactors relating to meaning of the content in the electronic document.9. The method of claim 1, wherein the electronic document is arecruiting document, the objective relates to demographic information ofpeople responding to the recruiting document, the predicted outcomepredicting characteristics of reader responses to the electronicdocument, and indicating a likelihood that the electronic document willachieve the objective.
 10. The method of claim 1, wherein providing thepredicted outcome for display on a user interface of the client devicefurther comprises providing a favorability score indicating a likelihoodthat the electronic document will achieve the objective.
 11. Anon-transitory computer-readable storage medium storing computer programinstructions executable to perform operations for evaluating anelectronic document with respect to an objective, the operationscomprising: receiving the electronic document from a client device via acomputer network, the electronic document having content directed towardachieving the obj ective; extracting a set of features from the contentof the electronic document, the extracted features includingphrase-related features indicating presence of distinctive phrases inthe content of the electronic document, categories corresponding to thedistinctive phrases, and the distinctive phrases' levels of associationwith the corresponding categories; evaluating the features in the setusing one or more machine-learned models that indicate directions anddegrees of correlation between the features extracted from the contentof the electronic document and the objective to which the content of thedocument is directed to predict an outcome of the electronic documentwith respect to the objective, evaluating the features comprising:assigning weights to the features in the set, a weight assigned to afeature indicating a direction and degree of correlation between thefeature and the predicted outcome of the electronic document, andcombining the weights of the set of features to predict the outcome ofthe electronic document; and providing the predicted outcome for displayon a user interface of the client device, wherein providing thepredicted outcome comprises identifying the plurality of distinctivephrases in the content of the electronic document and, for eachidentified distinctive phrase, indicating an influence of thedistinctive phrase on the predicted outcome of the electronic documentwith respect to the objective.
 12. The computer-readable medium of claim11, wherein a sign of a weight assigned to a feature indicates thedirection of correlation between the feature and the predicted outcomeof the electronic document, and an absolute value of the weight assignedto the feature indicates the degree of correlation between the featureand the predicted outcome.
 13. The computer-readable medium of claim 11,wherein a level of association of a distinctive phrase with acorresponding category is represented by a numerical value.
 14. Thecomputer-readable medium of claim 11, wherein providing the predictedoutcome further comprises identifying a category association of eachidentified distinctive phrase.
 15. The computer-readable medium of claim11, wherein the operations further comprise: establishing a trainingcorpus of electronic documents and associated known outcome datadescribing known outcomes resulting from postings of the electronicdocuments on the computer network; extracting a set of training featuresfrom contents of the electronic documents in the training corpus; andgenerating the one or more machine-learned models by correlating thesets of training features extracted from the contents of the electronicdocuments in the training corpus and the associated known outcome data.16. The computer-readable medium of claim 15, wherein assigning weightsto the features in the set is based on coefficients identified throughthe one or more machine-learned models.
 17. The computer-readable mediumof claim 11, wherein the operations further comprise: posting theelectronic document on the computer network; receiving outcome datadescribing responses to the posting of the electronic document on thecomputer network with respect to the objective; and selectively revisingthe machine-learned models based on the outcome data.
 18. Thecomputer-readable medium of claim 11, wherein extracting the set offeatures from the content of the electronic document further comprises:extracting syntactic factors describing a structure of sentences in thecontent of the electronic document; extracting structural factorsrelating to structure and layout of the content of the electronicdocument; and extracting semantic factors relating to meaning of thecontent in the electronic document.
 19. The computer-readable medium ofclaim 11, wherein the electronic document is a recruiting document, theobjective relates to demographic information of people responding to therecruiting document, the predicted outcome predicting characteristics ofreader responses to the electronic document, and indicating a likelihoodthat the electronic document will achieve the objective.
 20. Thecomputer-readable medium of claim 11, wherein providing the predictedoutcome for display on a user interface of the client device furthercomprises providing a favorability score indicating a likelihood thatthe electronic document will achieve the objective.